Home

Data & Predictions

First column: Number of favorites (blue) and retweets (orange) for xGPhilosophy’s end-of-match xG summary tweets since the account started to make regular tweets around the beginning of 2020.

Example:

Second column: Actual vs. predicted number of favorites and retweets.

Third column: Percent rank of the difference between the actual and predicted number of favorites/retweets.

Taking the difference between the actual counts and the models’ predictions gives us an “over expected” number quantifying how unexpected a given tweet’s amount of engagement is. For example, a tweet that is predicted to receive 100 favorites and 10 retweets, but actually receives 200 favorites and 30 retweets tells us that the tweet received more attention than one might expect. This may tell us that the game was particularly interesting, perhaps due to something like a controversial, late-game VAR decision.

Prediction Explanation

First row: SHAP values for the prediction of a selected tweet. A positive SHAP value indicates that the feature contributed to making the prediction greater than the baseline (i.e. the average prediction).

The number of followers for the xGPhilosophy account at the time of the tweet is often the most important feature. For more recent tweets, it tends to be positive. This is expected given that what we are trying to predict (number of favorites/retweets) is highly correlated with the number of xGPhilosophy followers.

Second row: Aggregated SHAP values for the selected tweet. If the sum of positive and negative SHAP values is greater than 0, then the predicted number is greater than the average prediction.

More recent tweets are much more likely to have an aggregate positive SHAP value.

Third row: Like the actual vs. predicted plot on the Data & Predictions tab, but just for the selected tweet. This helps contextualize the SHAP value.

Data & Predictions

Column

Prediction Explanation

Column